在线持续学习(OCL)旨在通过单个通过数据从非平稳数据流进行逐步训练神经网络。基于彩排的方法试图用少量的内存近似观察到的输入分布,并以后重新审视它们以避免忘记。尽管具有强烈的经验表现,但排练方法仍然遭受了过去数据损失景观和记忆样本的差异。本文重新讨论了在线设置中的排练动态。我们从偏见和动态的经验风险最小化的角度从固有的内存过度拟合风险中提供了理论见解,并检查重复排练的优点和限制。受我们的分析的启发,一个简单而直观的基线,重复的增强彩排(RAR)旨在解决在线彩排的拟合不足的困境。令人惊讶的是,在四个相当不同的OCL基准测试中,这种简单的基线表现优于香草排练9%-17%,并且显着改善了基于最新的彩排方法miR,ASER和SCR。我们还证明,RAR成功地实现了过去数据的损失格局和其学习轨迹中的高损失山脊厌恶的准确近似。进行了广泛的消融研究,以研究重复和增强彩排和增强学习(RL)之间的相互作用(RL),以动态调整RAR的超参数以平衡在线稳定性 - 塑性权衡折衷。
translated by 谷歌翻译
通常,机器学习应用程序必须应对动态环境,其中数据以潜在无限长度和瞬态行为的连续数据流的形式收集。与传统(批量)数据挖掘相比,流处理算法对计算资源和对数据演进的适应性具有额外要求。它们必须逐步处理实例,因为数据的连续流量禁止存储多次通过的数据。合奏学习在这种情况下取​​得了显着的预测性能。实现为一组(几个)个别分类器,合奏是自然可用于任务并行性的。但是,用于捕获概念漂移的增量学习和动态数据结构增加了缓存未命中并阻碍了并行性的好处。本文提出了一种迷你批处理策略,可以改善多核环境中用于流挖掘的多个集合算法的内存访问局部性和性能。借助正式框架,我们证明迷你批量可以显着降低重用距离(以及缓存未命中的数量)。在六种不同的最先进的集合算法上应用四个基准数据集的六种不同特性的实验显示了8个核心处理器上高达5倍的加速。这些效益牺牲了预测性能的少量减少。
translated by 谷歌翻译
多标签学习在考虑标签相关的同时,从给定标签设置的标签中的一个子集。具有多标签分类的已知挑战是标签的长尾分布。许多研究侧重于改善模型的整体预测,从而不优先考虑尾端标签。改善医学文本的多标签分类中的尾端标签预测使得能够更好地了解患者并改善护理。一个或多个不频繁标签所获得的知识可能会影响医学决策和治疗计划的原因。本研究介绍了包括多生物传感器的级联特定语言模型的变化,以实现两个主要目标。首先,在多标签问题上改善F1罕见标签,特别是长尾标签;其次,要处理长医疗文本和多源电子健康记录(EHRS),对于旨在在短输入序列上工作的标准变压器的具有挑战性的任务。本研究的重要贡献是使用变换器XL获得的新的最先进的(SOTA)结果,以预测医学代码。在医疗信息MART进行各种实验,用于重症监护(MIMIC-III)数据库。结果表明,连接的生物化变压器在整体微观和宏F1分数和尾端标签的单独F1分数方面优于标准变压器,而不是对长输入序列的现有变压器的解决方案产生较低的训练时间。
translated by 谷歌翻译
Curating datasets for object segmentation is a difficult task. With the advent of large-scale pre-trained generative models, conditional image generation has been given a significant boost in result quality and ease of use. In this paper, we present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions, without requiring segmentation labels. We leverage and explore pre-trained latent diffusion models, to automatically generate weak segmentation masks for concepts and objects. The masks are then used to fine-tune the diffusion model on an inpainting task, which enables fine-grained removal of the object, while at the same time providing a synthetic foreground and background dataset. We demonstrate that using this method beats previous methods in both discriminative and generative performance and closes the gap with fully supervised training while requiring no pixel-wise object labels. We show results on the task of segmenting four different objects (humans, dogs, cars, birds).
translated by 谷歌翻译
Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the development of techniques mitigating these stereotypical associations. However, as recent research has pointed out, the current benchmarks lack a robust experimental setup, consequently hindering the inference of meaningful conclusions from their evaluation metrics. In this paper, we extend these arguments and demonstrate that existing techniques and benchmarks aiming to measure stereotypes tend to be inaccurate and consist of a high degree of experimental noise that severely limits the knowledge we can gain from benchmarking language models based on them. Accordingly, we propose a new framework for robustly measuring and quantifying biases exhibited by generative language models. Finally, we use this framework to investigate GPT-3's occupational gender bias and propose prompting techniques for mitigating these biases without the need for fine-tuning.
translated by 谷歌翻译
Machine learning methods like neural networks are extremely successful and popular in a variety of applications, however, they come at substantial computational costs, accompanied by high energy demands. In contrast, hardware capabilities are limited and there is evidence that technology scaling is stuttering, therefore, new approaches to meet the performance demands of increasingly complex model architectures are required. As an unsafe optimization, noisy computations are more energy efficient, and given a fixed power budget also more time efficient. However, any kind of unsafe optimization requires counter measures to ensure functionally correct results. This work considers noisy computations in an abstract form, and gears to understand the implications of such noise on the accuracy of neural-network-based classifiers as an exemplary workload. We propose a methodology called "Walking Noise" that allows to assess the robustness of different layers of deep architectures by means of a so-called "midpoint noise level" metric. We then investigate the implications of additive and multiplicative noise for different classification tasks and model architectures, with and without batch normalization. While noisy training significantly increases robustness for both noise types, we observe a clear trend to increase weights and thus increase the signal-to-noise ratio for additive noise injection. For the multiplicative case, we find that some networks, with suitably simple tasks, automatically learn an internal binary representation, hence becoming extremely robust. Overall this work proposes a method to measure the layer-specific robustness and shares first insights on how networks learn to compensate injected noise, and thus, contributes to understand robustness against noisy computations.
translated by 谷歌翻译
We describe an approach for empirical modeling of steel phase kinetics based on symbolic regression and genetic programming. The algorithm takes processed data gathered from dilatometer measurements and produces a system of differential equations that models the phase kinetics. Our initial results demonstrate that the proposed approach allows to identify compact differential equations that fit the data. The model predicts ferrite, pearlite and bainite formation for a single steel type. Martensite is not yet included in the model. Future work shall incorporate martensite and generalize to multiple steel types with different chemical compositions.
translated by 谷歌翻译
We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles. The stochastic ensembles are formulated as families of distributions and trained to approximate the Bayesian posterior with variational inference. We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and a novel non-parametric version of dropout and evaluate them on a toy problem and CIFAR image classification. For CIFAR, the stochastic ensembles are quantitatively compared to published Hamiltonian Monte Carlo results for a ResNet-20 architecture. We also test the quality of the posteriors directly against Hamiltonian Monte Carlo simulations in a simplified toy model. Our results show that in a number of settings, stochastic ensembles provide more accurate posterior estimates than regular deep ensembles.
translated by 谷歌翻译
Compressing neural network architectures is important to allow the deployment of models to embedded or mobile devices, and pruning and quantization are the major approaches to compress neural networks nowadays. Both methods benefit when compression parameters are selected specifically for each layer. Finding good combinations of compression parameters, so-called compression policies, is hard as the problem spans an exponentially large search space. Effective compression policies consider the influence of the specific hardware architecture on the used compression methods. We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization, thus providing automatic compression for neural networks. Contrary to other approaches we use inference latency measured on the target hardware device as an optimization goal. With that, the framework supports the compression of models specific to a given hardware target. We validate our approach using three different reinforcement learning agents for pruning, quantization and joint pruning and quantization. Besides proving the functionality of our approach we were able to compress a ResNet18 for CIFAR-10, on an embedded ARM processor, to 20% of the original inference latency without significant loss of accuracy. Moreover, we can demonstrate that a joint search and compression using pruning and quantization is superior to an individual search for policies using a single compression method.
translated by 谷歌翻译
Early on during a pandemic, vaccine availability is limited, requiring prioritisation of different population groups. Evaluating vaccine allocation is therefore a crucial element of pandemics response. In the present work, we develop a model to retrospectively evaluate age-dependent counterfactual vaccine allocation strategies against the COVID-19 pandemic. To estimate the effect of allocation on the expected severe-case incidence, we employ a simulation-assisted causal modelling approach which combines a compartmental infection-dynamics simulation, a coarse-grained, data-driven causal model and literature estimates for immunity waning. We compare Israel's implemented vaccine allocation strategy in 2021 to counterfactual strategies such as no prioritisation, prioritisation of younger age groups or a strict risk-ranked approach; we find that Israel's implemented strategy was indeed highly effective. We also study the marginal impact of increasing vaccine uptake for a given age group and find that increasing vaccinations in the elderly is most effective at preventing severe cases, whereas additional vaccinations for middle-aged groups reduce infections most effectively. Due to its modular structure, our model can easily be adapted to study future pandemics. We demonstrate this flexibility by investigating vaccine allocation strategies for a pandemic with characteristics of the Spanish Flu. Our approach thus helps evaluate vaccination strategies under the complex interplay of core epidemic factors, including age-dependent risk profiles, immunity waning, vaccine availability and spreading rates.
translated by 谷歌翻译